P a g e 2 | 12
This type of pragmatic inference can likely be useful for a wide range of NLP applications that require
accurate anticipation of people’s intents and emotional reactions, even when they are not expressly
mentioned. For example, an ideal dialogue system should react in empathetic ways by reasoning
about the human user’s mental state based on the events the user has experienced, without the user
explicitly stating how they are feeling. Furthermore, advertisement systems on social media should
be able to reason about the emotional reactions of people after events such as mass shootings and
remove ads for guns, which might increase social distress. Also, the pragmatic inference is a
necessary step toward automatic narrative understanding and generation. However, this type of
commonsense social reasoning goes far beyond the widely studied entailment tasks and thus falls
outside the scope of existing benchmarks.
Q2. What is SWAG in NLP?
Answer:
SWAG stands for Situations with Adversarial Generations is a dataset consisting of 113k multiple-
choice questions about a rich spectrum of grounded situations.
Swag: A Large Scale Adversarial Dataset for Grounded Commonsense Inference
According to NLP research paper on SWAG is “Given a partial description like “he opened the hood
of the car,” humans can reason about the situation and anticipate what might come next (“then, he
examined the engine”). In this paper, you introduce the task of grounded commonsense inference,
unifying natural language inference(NLI), and common-sense reasoning.
We present SWAG, a dataset with 113k multiple-choice questions about the rich spectrum of
grounded positions. To address recurring challenges of annotation artifacts and human biases found
in many existing datasets, we propose AF(Adversarial Filtering), a novel procedure that constructs a
de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter
the data. To account for the aggressive adversarial filtering, we use state-of-the-art language models
to oversample a diverse set of potential counterfactuals massively. Empirical results present that
while humans can solve the resulting inference problems with high accuracy (88%), various
competitive models make an effort on our task. We provide a comprehensive analysis that indicates
significant opportunities for future research.
When we read a tale, we bring to it a large body of implied knowledge about the physical world. For
instance, given the context “on stage, a man takes a seat at the piano,” we can easily infer what the
situation might look like: a man is giving a piano performance, with a crowd watching him. We can
furthermore infer him likely next action: he will most likely set his fingers on the piano key and start
playing.